被MongoDB用Index暴打的後端小菜雞日記-day27- Index Types

2022 iThome 鐵人賽

DAY 27

自我挑戰組

被MongoDB用Aggregate暴打的後端小菜雞日記系列第 27 篇

14th鐵人賽 mongodb index

鰻魚燒

2022-09-27 23:51:22

1031 瀏覽

分享至

昨天介紹不少索引好用的特性，除了加快資料搜尋速度以外，還有不同的功能，適合用在不同的情境下，今天要來介紹基於欄位的資料型態不同，而分出不同的索引類型。

Multikey Index

當建立索引的欄位，資料格式是陣列時，MongoDB會自動建立multikey index。這類型的索引，在建立時會有一些限制。
例如：在Compound Index中，最多只能包含一個索引欄位的資料格式是陣列，這時候會有兩種出錯的可能性。

一種是無法建立索引，當欄位的資料格式都是陣列

{ _id: 1, a: [ 1, 2 ], b: [ 1, 2 ] } // 資料型態
db.collection.createIndex({ a: 1, b: 1 }) // 想要建立的索引

另一種是無法寫入資料，會跑出錯誤訊息 MongoError: cannot index parallel arrays [a] [b]

db.collection.createIndex({ a: 1, b: 1 }) // 已經建立的索引
{ _id: 1, a: [1, 2], b: [1, 2], category: "A array" } // 想要插入的資料

但是這種資料格式，是被允許的，因為每一筆資料只有一個索引欄位的資料格式是陣列

{ _id: 1, a: [1, 2], b: 1 } 
{ _id: 2, a: 1, b: [1, 2] }

Text Indexes

顧名思義是針對文字欄位建立的索引，可以針對不同的語言進行設定，根據官方文件的說法，針對不同語言的字根解析、停用詞的忽略會有所不同。

The default language associated with the indexed data determines the rules to parse word roots (i.e. stemming) and ignore stop words.

同時也可以針對不同的欄位，設定搜尋的權重

假設我們現在有專門紀錄部落格的collection，包含文字的欄位有content、about 、keywords這三種，範例資料如下。


{
  _id: 1,
  content: "This morning I had a cup of coffee.",
  about: "beverage",
  keywords: [ "coffee" ]
}
{
  _id: 2,
  content: "Who doesn't like cake?",
  about: "food",
  keywords: [ "cake", "food", "dessert" ]
}

要建立索引的話，可以使用以下語法

blog.createIndex(
   { 
      // 寫入要設定index的欄位
      // 這裡不會用１和 -1 表示排列順序，而是用"text"代表建立Text Indexes
      content : "text",
      keywords: "text",
      about: "text"
   },
   { 
      default_language: "english", // 更改預設的語言，預設語言為英文
      weights: { // weights 用來設定權重
         content: 10,  // 寫入各個欄位的權重值，如果沒有特別設定，預設會是１
         keywords: 5
      },
   }  
)

ps. 有支援哪些語言，請參考官方文件

Hashed Indexes

在建立索引時，會將欄位的資料利用hashing function轉換成雜湊值，儲存在索引中，如果使用hashed indexes必須注意一些限制。

不支援Multikey Index，索引欄位不可以是陣列，否則會發生錯誤
無法建立Unique Indexes
盡量不要將含有浮點數的欄位，轉換成hashed indexes，詳情參考

建立方式也很簡單，在要建立索引的欄位後面寫”hashed”即可。

db.collection.createIndex( { field: "hashed" } )

Wildcard Indexes

因為MongoDB支持dynamic schemas，並沒有硬性規定資料格式，如果有欄位的資料格式是會變化，很難預測未來會加入什麼資料，但又想要建立索引加快資料的搜尋速度，可以透過$**建立Wildcard Indexes，支持對未知或任意字段的查詢。

假設有以下資料

{ _id: 1, "userMetadata" : { "likes" : [ "dogs", "cats" ] } }
{ _id: 2, "userMetadata" : { "dislikes" : "pickles" } }
{ _id: 3, "userMetadata" : { "age" : 45 } }
{ _id: 4, "userMetadata" : "inactive" }

而我們針對userMetadata這個欄位建立wildcard indexes

db.userData.createIndex( { "userMetadata.$**" : 1 }

則該索引可以支持以下搜尋條件

db.userData.find({ "userMetadata.likes" : "dogs" })
db.userData.find({ "userMetadata.dislikes" : "pickles" })
db.userData.find({ "userMetadata.age" : { $gt : 30 } })
db.userData.find({ "userMetadata" : "inactive" })

如果大家索引有興趣的話，可以參考官方文件，今天列舉的例子，蠻多都是官網上範例，作者已經想不到其他比較常用範例了ＸＤ

本篇文章同步放在我的部落格，大家有空可以進來逛逛